K-similar strings

Time: O(NxN!div(C_A!x_xC_Z!)); Space: O(NxN!div(C_A!x_xC_Z!)); hard

Strings A and B are K-similar (for some non-negative integer K) if we can swap the positions of two letters in A exactly K times so that the resulting string equals B.

Given two anagrams A and B, return the smallest K for which A and B are K-similar.

Example 1:

Input: A = “ab”, B = “ba”

Output: 1

Example 2:

Input: A = “abc”, B = “bca”

Output: 2

Example 3:

Input: A = “abac”, B = “baca”

Output: 2

Example 4:

Input: A = “aabc”, B = “abca”

Output: 2

Notes:

  • 1 <= len(A) == len(B) <= 20

  • A and B contain only lowercase letters from the set {‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’}

Approach Framework

Explanation

We’ll call the underlying graph of the problem, the graph with 6 nodes ‘a’, ‘b’, …, ‘f’ and the edges A[i] -> B[i]. Our goal is for this graph to have only self-edges (edges of the form a -> a.) Let’s make some deductions about how swaps between A[i] and A[j] affect this graph, and the nature of optimal swap schedules.

If A = ‘ca…’ and B = ‘ab…’, then the first two edges of the underlying graph are c -> a and a -> b; and a swap between A[1] and A[0] changes these two edges to the single edge c -> b. Let’s call this type of operation ‘cutting corners’. Intuitively, our optimal swap schedule always increases the number of matches (A[i] == B[i]s) for each swap, so cutting corners is the only type of operation we need to consider. (This is essentially the happy swap assumption, proved in Couples Holding Hands)

Now consider any cycle decomposition of the underlying graph. (This decomposition (or the number of cycles), is not necessarily unique.) Through operations of cutting corners, we’ll delete all the (non-self) edges. Each cycle of length k requires k-1 operations to delete. Thus, the answer is just the minimum possible value of ∑(Ck − 1), where C1, … Ck are the lengths of the cycles in some cycle decomposition of the underlying graph. This can be re-written as (Number of non-self edges) - (Number of cycles). Hence, we want to maximize the number of cycles in a cycle decomposition of the underlying graph.

1. Brute Force with Dynamic Programming

Intuition and Algorithm Let P1, P2, … be possible cycles of the underlying graph GG. Let’s attempt to write G=∑kixPi for some constants ki. Then, the number of cycles is ∑ki. We can represent G and Pi as the number of directed edges from node X to Y. For example, if P1 is the cycle a -> b -> d -> e -> a, then P1 is a -> b plus b -> d plus d -> e plus e -> a. This can be represented as a column vector possibles[0] of 1s and 0s, with four 1s, (each possibles[0][i] == 1 represents the ith directed edge being there (and having quantity 1)). Similarly, the graph G can also be represented as a column vector.

This sets the stage for dynamic programming. For each graph G, represented as a column vector, say we want to find numCycles(G). We can take all possible cycles C, and check if G contains C. If it does, then a candidate answer is 1 + numCycles(G - C).

It should also be noted that maximizing the number of cycles cannot be done in a greedy way, ie. by always removing the lowest size cycle. For example, consider the graph with edges a -> b -> c -> a, b -> d -> e -> b, and c -> e -> f -> c. Those form cycles, and there is a fourth 3-cycle b -> c -> e -> b. If we remove that cycle first, then we would have only two cycles; but if we remove the first 3 cycles, then we would have three cycles.

[8]:
import itertools

class Solution1(object):
    """
    Time: O(2^(N+W)), where N is the length of the string, and W is the length of the alphabet.
    Space: O(2^(N+W))
    """
    def kSimilarity(self, A, B):
        """
        :type A: str
        :type B: str
        :rtype: int
        if A == B:
            return 0

        N = len(A)
        alphabet = 'abcdef'
        pairs = [(a, b) for a in alphabet for b in alphabet if a != b]
        index = {p: i for i, p in enumerate(pairs)}

        count = [0] * len(index)
        for a, b in zip(A, B):
            if a != b:
                count[index[a, b]] += 1

        seen = set()
        for size in range(2, len(alphabet) + 1):
            for cand in itertools.permutations(alphabet, size):
                i = cand.index(min(cand))
                seen.add(cand[i:] + cand[:i])

        possibles = []
        for cand in seen:
            row = [0] * len(alphabet) * (len(alphabet) - 1)
            for a, b in zip(cand, cand[1:] + cand[:1]):
                row[index[a, b]] += 1
            possibles.append(row)

        ZERO = tuple([0] * len(row))
        memo = {ZERO: 0}
        def solve(count):
            if count in memo: return memo[count]

            ans = float('-inf')
            for row in possibles:
                count2 = list(count)
                for i, x in enumerate(row):
                    if count2[i] >= x:
                        count2[i] -= x
                    else: break
                else:
                    ans = max(ans, 1 + solve(tuple(count2)))

            memo[count] = ans
            return ans

        return sum(count) - solve(tuple(count))
[9]:
s = Solution1()
A = "ab"
B = "ba"
assert s.kSimilarity(A, B) == 1
A = "abc"
B = "bca"
assert s.kSimilarity(A, B) == 2
A = "abac"
B = "baca"
assert s.kSimilarity(A, B) == 2
A = "aabc"
B = "abca"
assert s.kSimilarity(A, B) == 2